142 research outputs found

    The distribution of amorphous computer outputs

    Get PDF
    Fitness distributions (landscapes) of programs tend to a limit as they get bigger. Markov minorization gives upper bounds ((15.3 + 2.30m)/ log I) on the length of program run on random or average computing devices. I is the size of the instruction set and m size of output register. Almost all programs are constants. Convergence is exponential with 90% of programs of length 1.6 n2N yielding constants (n = size input register and size of memory = N). This is supported by experiment

    Repeated patterns in tree genetic programming

    Get PDF
    We extend our analysis of repetitive patterns found in genetic programming genomes to tree based GP. As in linear GP, repetitive patterns are present in large numbers. Size fair crossover limits bloat in automatic programming, preventing the evolution of recurring motifs. We examine these complex properties in detail: e.g. using depth v. size Catalan binary tree shape plots, subgraph and subtree matching, information entropy, syntactic and semantic fitness correlations and diffuse introns. We relate this emergent phenomenon to considerations about building blocks in GP and how GP works

    Genetic programming in data mining for drug discovery

    Get PDF
    Genetic programming (GP) is used to extract from rat oral bioavailability (OB) measurements simple, interpretable and predictive QSAR models which both generalise to rats and to marketed drugs in humans. Receiver Operating Characteristics (ROC) curves for the binary classier produced by machine learning show no statistical dierence between rats (albeit without known clearance dierences) and man. Thus evolutionary computing oers the prospect of in silico ADME screening, e.g. for \virtual" chemicals, for pharmaceutical drug discovery

    Repeated sequences in linear genetic programming genomes

    Get PDF
    Biological chromosomes are replete with repetitive sequences, micro satellites, SSR tracts, ALU, etc. in their DNA base sequences. We started looking for similar phenomena in evolutionary computation. First studies find copious repeated sequences, which can be hierarchically decomposed into shorter sequences, in programs evolved using both homologous and two point crossover but not with headless chicken crossover or other mutations. In bloated programs the small number of effective or expressed instructions appear in both repeated and nonrepeated code. Hinting that building-blocks or code reuse may evolve in unplanned ways. Mackey-Glass chaotic time series prediction and eukaryotic protein localisation (both previously used as artificial intelligence machine learning benchmarks) demonstrate evolution of Shannon information (entropy) and lead to models capable of lossy Kolmogorov compression. Our findings with diverse benchmarks and GP systems suggest this emergent phenomenon may be widespread in genetic systems

    Genetic Improvement of computational biology software

    Get PDF
    There is a cultural divide between computer scientists and biologists that needs to be addressed. The two disciplines used to be quite unrelated but many new research areas have arisen from their synergy. We selectively review two multi-disciplinary problems: dealing with contamination in sequencing data repositories and improving software using biology inspired evolutionary computing. Through several examples, we show that ideas from biology may result in optimised code and provide surprising improvements that overcome challenges in speed and quality trade-offs. On the other hand, development of computational methods is essential for maintaining contamination free databases. Computer scientists and biologists must always be sceptical of each others data, just as they would be of their own

    Evolving text classification rules with genetic programming

    Get PDF
    We describe a novel method for using genetic programming to create compact classification rules using combinations of N-grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that the rules may have a number of other uses beyond classification and provide a basis for text mining applications

    Memory with memory in genetic programming

    Get PDF
    We introduce Memory with Memory Genetic Programming (MwM-GP), where we use soft assignments and soft return operations. Instead of having the new value completely overwrite the old value of registers or memory, soft assignments combine such values. Similarly, in soft return operations the value of a function node is a blend between the result of a calculation and previously returned results. In extensive empirical tests, MwM-GP almost always does as well as traditional GP, while significantly outperforming it in several cases. MwM-GP also tends to be far more consistent than traditional GP. The data suggest that MwM-GP works by successively refining an approximate solution to the target problem and that it is much less likely to have truly ineffective code. MwM-GP can continue to improve over time, but it is less likely to get the sort of exact solution that one might find with traditional GP
    corecore